Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study

نویسندگان

  • Jianbin Fang
  • Ana Lucia Varbanescu
  • Henk Sips
چکیده

In this paper, we analyze the trade-offs encountered when minimizing the total execution time using the rake-based applications on GPUs. We use clustering data streams as a case study, and present a rake-based implementation for it, making it more efficient in terms of memory usage. In order to maximize performance for different problem sizes and architectures, we propose a model-based auto-tuning solution. Experimental results show that our fully optimized implementation can perform 2.1x and 1.4x faster than the native OpenCL implementation on NVIDIA GTX480 and AMD HD5870, respectively; it can also achieve 1.4x to 3.3x speedup relative to the original CUDA implementation solution on GTX480.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sociological Impact of Using Digital (Web-based) Analyses on Performance Measurement and Optimization of Digital Marketing among Young Managers (Case study: Digital-based Companies in Tehran)

This research aims to study the effect of using digital (web-based) analyses in performance measurement and optimization of digital marketing in digital-based companies in Tehran. The data collection tool was a researcher-made questionnaire. A panel of experts and supervisor were asked to measure the validity of the questionnaire. For reliability analysis of this tool, Cronbach’s alpha test was...

متن کامل

Forward Link Performance of Site Selection Diversity Transmit with Antenna Diversity and Rake Combining in DS-CDMA Mobile Radio

Abstract In this paper, it is theoretically shown that the optimal solution to maximize the DS-CDMA forward link capacity under the condition of constant total transmit power is to transmit only from the best base station (BS) that has the maximum channel gain (this is called site selection diversity transmit (SSDT)). This theoretical analysis is confirmed by the Monte-Carlo computer simulation...

متن کامل

A Novel Integrated Approach to Oil Production Optimization and Limiting the Water Cut Using Intelligent Well Concept: Using Case Studies

Intelligent well technology has provided facility for real time production control through use of subsurface instrumentation. Early detection of water production allows for a prompt remedial action. Effective water control requires the appropriate performance of individual devices in wells on maintaining the equilibrium between water and oil production over the entire field life. However, there...

متن کامل

An approach to Improve Particle Swarm Optimization Algorithm Using CUDA

The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the pr...

متن کامل

A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs

GPUs are a class of specialized parallel architectures with tremendous computational power. The new Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on NVIDIA GPUs. However, there are various performance-influencing factors specific to GPU architectures that need to be accurately characterized to effectively utilize...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011